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A bs h -a c / Arrpl itude  compression  processing  is  used  to  reduce 
the  amplitude  level  variations  of  speech  to  fit  the  reduced 
dynamic  ranges  of  sensorineural  impaired  listeners  However 
this  processing  resdts  in  spectral  srearing  due  in  part  to 
reduced  peak- to- valley  ratios  Presented  here  are  two 
variations  of  a  compression  processing  algorithm  based  on  a 
sinusoidal  speech  model  that  preserves  the  important  spectral 
peaks  Both  models  operate  on  a  time- varying  frequency- 
dependent  basis  to  adjust  to  the  speech  variations  and  the 
listeners  hearing  profile  Preliminary  subject  tests  indicate 
benefit  from  preserving  spectral  contrast  Enhancing  spectral 
contrast  is  possible  with  the  algorithm  presented  here 

I.  Introduction 

Sensorineural  hearing  losses  are  characterized  by  a 
reduced  dynamic  range  of  hearing  and  reduced  spectral 
resolution.  Compensating  for  elevated  thresholds  involves 
amplification  to  raise  speech  above  threshold,  while 
amplitude  limiting  keeps  speech  signals  from  exceeding  the 
impaired  listenerfs  threshold  of  discomfort.  Linear 
techniques  apply  gain  directly  to  the  amplitude  of  the 
incoming  signal.  Peak  clipping  and  compression  limiting 
are  used  to  keep  high  I  e/el  signals  from  exceeding  the 
listenerfs  threshold  of  discomfort 

Nonlinear  techniques,  such  as  amplitude  compression, 
reduce  the  amplitude  I  e/el  variations  of  the  signal  to  fit  the 
I  i  stenerf s  reduced  dynarri c  range  of  heari  ng.  Singl  e-channel 
(wideband)  systems  process  the  entire  speech  signal  on  the 
basis  of  overall  le/el.  Multiband  syllabic  compression 
systems  reduce  the  variation  in  speech  level  in  each 
frequency  band  according  to  the  subjects  reduced  dynamic 
range  in  that  band.  However,  differing  nonlinear  processing 
in  adjacent  bands  can  cause  audible  distortion.  The 
wideband  and  multiband  compression  systems  mostly  use 
digital  or  analog  filters  along  with  equalization  gain.  In 
most  cases,  compression  parameters  remain  the  same  over 
time.  Conventional  compression  processes  commonly 
reduce  spectral  peak-to- valley  ratios  in  speech,  which  can 
hamper  speech  perception. 

Waveform  parameterization  models,  such  as  sinusoidal 
modeling,  can  be  used  in  place  of  filter-based  techniques  to 
achia/e  linear  or  compression  processing  [1,2].  These 
model  s  al  I  ow  greater  f  I  exi  bi  I  i  ty  i  n  the  range  of  compensati  on 
processi  ng  ,  and  I  end  themsel  ves  to  ti  me-varyi  ng  techniques. 
Because  the  sinusoidal  model  allows  manipulation  of 


i  ndi  vidual  frequency  components  i  n  each  frame,  those  peaks 
with  the  greatest  energy  in  each  band  can  be  processed  so 
that  thei  r  relative  shape  is  mai  rtai  ned.  For  example,  we  have 
described  an  amplitude  compression  scheme  that  preserves 
the  resolution  of  spectral  peaks  [1],  This  system  was 
j  multi  band  in  the  sense  that  differing  amounts  of 
compression  were  applied  to  the  various  frequency  regions. 
However  the  system  avoided  the  use  of  physical  frequency 
bands  that  can  lead  to  distortions  from  disconti  nuities  at  the 
boundaries  between  the  bands.  This  scheme  is  also  time 
varying  since  the  processing  is  optimized  for  each  frame  of 
speech.  The  real-time  implementation  allowed  convenient 
testing  on  hearing  impaired  listeners  [2], 

Presented  here  is  an  analysis  of  the  processing 
algorithm  and  an  alternative  implementation  that  addresses 
some  of  the  potenti  al  shortcorri  ngs  of  the  basi  c  model . 

II.  Basic  Processing  Algorithm 

The  sinusoidal  model  represents  speech  as  the  sum  of 
sinusoids  with  various  amplitudes,  frequencies  and  phases. 
This  model  has  parameters  that  are  independent  of  voicing 
state  and  pitch  period.  The  frequencies  of  the  sinusoids  in 
frame  k  are  chosen  to  correspond  to  the.V(/0  largest  peaks 
in  the  magnitude  of  the  short-time  Fourier  transform  of  the 
speech  signal.  The  application  here,  which  has  been 
implemented  on  a  TMS320C30  microprocessor,  uses  7.5 
msec  analysis  frames  and  30  msec  Hamming  windows, 
leading  to  a  4-to-l  time  overlap.  A  256  point  FFT  is  used 
to  provide  sufficient  resolution  for  the  speech  sampled  at 
8.013  kHz.  Synthesis  is  done  by  using  an  inverse  FFT. 

The  processing  algorithm  assumes  there  are  up  to  six 
important  peaks  in  each  frame.  The  number  of  peaks 
selected  is  determined  by  the  constrai  nt  that  the  peaks  must 
be  some  mini  mum  frequency  spacing  from  each  other.  If 
two  spectral  peaks  are  close  in  frequency,  it  is  assumed  that 
they  ari  se  from  a  si  ngl  e  formant  I  n  the  exampl  e  shown  i  n 
Fig.  1,  the  top  four  peaks  were  chosen,  and  the  processi  ng  i s 
optimized  for  those  peaks.  The  compression  ratio  is 
calculated  for  each  principal  peak  in  each  frame  as  the  ratio 
between  the  impaired  and  the  normal  listenersi  dynamic 
range  of  hearing.  These  are  represented  as  A*  and  A, 
respectively.  The  gain  for  a  given  peak  is  calculated  such 
that  the  ratio  of  the  peak  sensation  I  a/els  for  normal  versus 
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impaired  listeners  is  equal  to  the  ratio  of  their  respective 
dynarri  c  ranges.  I  n  other  words, 


where  A  is  the  normal  dynamic  range  and  A*  is  impaired 
dynamic  range  at  the  frequency  of  the  peak,  8  is  the 
sensation  level  for  the  sinusoid  for  normal  listeners,  and  S 
i  s  the  sensati  on  I  evel  for  the  processed  si  nusoi  d  f or  i  rrpai  red 
listeners.  The  amplitude  of  theprocessed  peak  is 

A"  =cA  +  Tim  (2) 

where;  Tjm  is  the  impaired  threshold  of  hearing,  and  A  is  the 
amplitude  of  the  original  sinusoid  peak.  The  resulting  gain 
is  the  difference  between  A“  and  A. 

Fig.  1  shows  the  spectrum  of  a  speech  segment.  The 
peaks  of  the  FFT  that  are  used  as  the  underlying  sinusoids 
are  i  ndicated  by  a  * .  The  squares  denote  the  si  nusoi  ds  that 
are  selected  as  principal  peaks.  The  sloping  line  represents 
the  threshol  d  of  heari  ng  Tim  for  the  i  rnpai  red  I  i  stener  used  i  n 
this  example. 

I  n  order  to  preserve  the  relative  peak-to- valley  ratio  of 
the  principal  peaks,  the  gains  for  those  peaks  are  used  to 
determine  the  amount  of  gain  to  apply  to  the  sinusoids 
between  the  principal  peaks.  To  prevent  discontinuities,  the 
gain  applied  to  each  sinusoid  transitions  linearly  from  the 
gain  at  one  principal  peak  to  the  gain  at  the  next  principal 
peak.  The  resulting  speech  has  the  reduced  dynamic  range 
characteristic  of  compression  processing  e/en  though  most 
of  the  peaks  actually  undergo  linear  gain  processing. 
Flenceforth  this  processing  scheme  will  be  referred  to  as 
COL  to  represent  its  compression  and  I  i  near  gai  n  aspects. 

The  processed  speech  is  synthesized  by  performing  an 
inverse  FFT  on  the  modified  sinusoid  peaks.  The 
processing  operates  on  the  magnitudes  of  the  complex 
spectral  amplitudes  only;  the  phase  is  not  changed.  The 
quality  of  the  algorithm  rel  ies  on  a  natural  ly  changi ng  phase, 
end  not  on  the  exact  val  ues  of  the  phase  components.  The 
4-to-l  overlap-add  process  used  in  the  synthesis  smoothes 
discontinuities  at  the  frame  boundaries.  Although  the  peak- 
to-valley  ratio  is  not  perfectly  maintained,  it  is  e/ident  from 
Fig.  1  that  the  important  peaks  are  clearly  distinguishable  in 
the  COL  processed  speech  and  do  not  suffer  from  the 
smearing  that  accompanies  conventional  compression 
processing. 

T o  reduce  ambient  background  noise  in  si  lent  regions,  if 
the  maximum  peak  in  a  band  is  below  30  dB,  it  is  only  given 
the  gai  n  that  would  be  appl  ied  to  a  30-dB  peak.  Some  gai  n 
is  needed  in  these  areas  to  avoid  the  perception  of 
discontinuities  in  the  signal.  It  is  presumed,  however  ,  that 
the  low-level  peaks  are  not  informative  parts  of  the  speech 
signal.  This  parameter  is  adjustable  and  can  be  optimized 
based  on  the  resul  ts  of  cl  i  ni  cal  tests. 


Frequency  in  Flz 


Fig.  L  Spesch  spectrum  indicating  top  four  spectrally 
important  principal  peaks  before  and  after  COL  processing. 
The  impaired  threshold  of  hearing  Tlm  is  also  shown.  The 

peaks  of  the  F  FT  used  as  the  model  si  nusoi  ds  are  i  ndi  catad  by 
a  *.  The  squares  denote  the  sinusoids  that  are  selected  as 
principal  peaks 


III.  Results  and  Analysis 

The  theoretical  bendfts  of  preserving  and  enhancing 
speech  spectral  contrast  for  listeners  with  impaired  hearing 
have  been  outl i  ned  previously  [3].  The  results  obtai  ned  with 
the  COL  processing  scheme  from  hard  of  hearing  listeners 
show  that  preserving  spectral  peak-to- val  ley  ratios  can 
improve  word  and  sentence  understanding  in  quiet  and  in 
noise  [4],  The  COL  algorithm  was  compared  to  a  multi  band 
compression  (MBC)  system  with  five  bands  and  a  40  dB 
threshol  d  si  rri  I  ar  to  those  compressi  on  systems  i  mpl  emented 
in  current  hearing  aids.  Stimuli  processed  by  both 
algorithms  had  the  same  long-term  spectrum  and  overall 
amplitude,  but  contained  different  spectral  peak-to- val  ley 
ratios.  Stimuli  were  presented  at  a  range  of  intensities  in 
both  qui et  and  i n  noise. 

Fig.  2  shows  the  spectra  for  COL  and  M  BC  processi  ng 
along  with  the  original  spectrum  for  a  segment  of  speech. 
Both  processing  methods  clearly  raise  the  speech  to  an 
audi ble  range  Although  the  M  BC  signal  (top  spectrum)  for 
this  frame  appears  stronger,  all  processed  signals  were 
matched  for  rms  prior  to  presentation.  Note  the  greater 
peak-to- val  ley  contrast  for  the  COL  signal  (middle 
spectrum).  Note  also  some  discontinuities  in  the  MBC 
si  gnal  that  occur  at  the  boundary  between  frequency  bands. 

Testing  was  conducted  on  COL  and  MBC  processing 
using  four  adult  listeners  with  moderate  hearing  loss  (flat 
and  sloping).  Stimuli  consisted  of  consonant- vowel - 
consonant  (CVC)  nonsense  syllables  under  four  listening 
conditions:  in  quiet  and  in  noise,  each  at  both  soft  and 
comfortable  levels.  Flere i  soft}  is  ddined  as  10  dB  below 
the  most  comfortable  level  (MCL).  Results  are  shown  in 
Table  1  for  the  both  soft  and  MCL  listening  la/els  in  both 
quiet  and  in  noise 


O  1.378  2.756  4.134  5.512 

Frequency  (Hz) 

Fig.  2.  Speech  spectra  for  COL  and  MBC  processing 
schemes.  The  top  spectrum  i  s  M  B  C,  COL  i  s  j  ust  bel  ow  that, 
and  the  original  spectrum  is  on  the  bottom  The  impaired 
threshol  d  of  heari  ng  Tim  i  s  al  so  shown. 

At  soft  la/els,  listeners  tended  to  perform  better  using 
COL,  especially  in  quiet  At  comfortable  levels,  listeners 
also  tended  to  perform  better  using  COL  in  noise.  However 
at  comfortable  le/els  with  no  noise,  both  processing 
methods  performed  equally  well.  Overall,  all  of  the  hearing 
impaired  listeners,  either  flat  or  si  oping  losses,  demonstrated 
benddt  from  COL  processing. 

The  main  observable  differences  in  the  outputs  of  the 
two  processing  scherres  are  in  the  spectral  resolution,  and 
the  tendency  for  MBC  to  have  higher  amplitude  levels  in 
high  frequency  regions  for  sloping  loss  subjects.  The 
reasons  for  the  differences  in  spectral  resolution  have 
already  been  noted.  The  COL  processing  tends  to  select 
primary  peaks  in  the  lower  frequency  regions  so  the  gains 
for  high  frequency  regions  are  often  driven  by  what  was 
determined  in  a  region  of  lesser  loss.  When  the  COL  is 
forced  to  choose  a  primary  peak  above  4  kHz,  the  output 
spectrum  is  closer  to  that  of  M  BC  in  that  frequency  region. 
Howa/er  in  many  speech  segments  there  is  not  much 
informative  speech  signal  information  above  4  kHz  so  it  is 
not  clear  whether  this  slight  change  in  amplitude  la/el  at 
high  frequencies  is  having  a  pronounced  effect  In 
preliminary  listening  tests  using  only  COL,  there  did  not 
appear  to  be  any  difference  i  n  performance  when  a  pri  rrary 
peak  was  forced  to  be  above  4  kHz.  Recent  clinical  data 
suggest  that  providing  amplification  in  the  region  of  4  kHz 
i  s  often  undesi  rabl  e  for  I  i  steners  wi  th  sa/ere  heari  ng  I  oss  [  5] . 
Howa/er  more  tests  need  to  be  conducted. 

The  question  of  whether  si  gnal  processi  ng  schemes  that 
preserve  or  enhance  spectral  contrast  can  compensate  for  the 
reduced  frequency  resolution  was  further  examined  in  [4], 
Normal  heari  ng  I  isteners  with  si  mulated  heari  ng  losses  were 
used  in  tests  similar  to  the  ones  conducted  with  hearing 
i  mpai  red  I  i  steners  to  determi  ne  whether  the  benefit  observed 
pra/iously  was  primarily  from  increased  audibility  or  from 
improved  spectral  resolution.  Results  were  also  compared 
with  those  obtai  ned  usi  ng  ori  gi  nal  unprocessed  speech. 


In  summary,  speech  processed  through  COL  was  as 
intelligible  as  the  original  for  both  simulated  mild  and 
sloping  hearing  losses.  In  fact  final  consonants  may  have 
been  more  intelligible  with  COL  than  with  the  original. 
Speech  processed  through  MBC  appropriate  for  mild  losses 
was  as  intelligible  as  the  original.  Howa/er  speech 
processed  through  MBC  for  sloping  losses  was  less 
intelligible  than  the  original  or  COL.  MBC  for  this  hearing 
loss  introduces  some  artifacts  and  spectral  smearing. 
Overall,  the  improvement  for  normal  hearing  listeners  was 
not  as  great  as  that  noted  by  I  isteners  with  heari  ng  loss  i  n  the 
pra/ious  study.  Therrtore  it  can  be  concluded  that  the 
increased  audibility  provided  by  COL  is  not  the  main 
benefit  Rather  the  results  suggest  that  the  improved 
spectral  contrast  with  COL  provides  some  compensation  for 
the  reduced  spectral  resol  ution  i  n  heari  ng  i  rrpai  red  I  isteners. 

Table  L  M  ean  (and  standard  deviation)  number  Of 
phonemes  correct  for  4  hard-of-hearing  listeners  identifying 
consonants  and  vowel  s  i  n  20  syl  I  abl  es  processed  by  COL  or 
by  MBC.  Syllables  were  presented  at  the  listeners!'  most 
corrf ortabl  e  I  evel  s  ( M  CL )  or  at  soft  I  evel  s  10  dB  bel  ow  thei  r 
MCL,  in  broadband  noiseor  with  no  noise  added.  Asterisks 
(* )  i  ndi  cate  a  si  gni  f  i  cant  di  ff erence  between  scores  obtai  ned 
usi  ng  COL  and  MBC  processi  ng. 


COL 

MBC 

Consonants 

Vowels 

Consonants 

Vowels 

Soft  la/el -noise 

4.3*  (1.3) 

8.3  (2.6) 

2.8  (1.8) 

6A(1.5) 

Soft  la/el  no  noise 

11.4*  (3.9) 

15.3*  (3.6) 

85(4.0) 

11.8  (3.4) 

MCL-noise 

6.8*  (2.6) 

13.8  (2.2) 

5.0  (1.7) 

9.0 (4.1) 

MCL-no  noise 

11.6  (.4) 

13.8  (1.0) 

9.8  (4.7) 

13.0  (3.8) 

IV.  LPC-Based  Implementation  of  Algorithm 

The  basic  COL  algorithm  selects  the  primary  peaks 
through  an  iterative  process  that  eliminates  certain  peaks 
from  further  consideration.  The  selected  peaks  are 
determined  in  part  by  the  maximum  number  of  peaks 
allowed  and  the  rrinimum  spacing  that  must  be  between 
them  The  val  ues  used  here  were  determi  ned  mosti  y  through 
experimental  means. 

To  test  further  the  importance  of  spectral  resolution  in 
compression  processing,  it  is  desirable  to  enhance  the  peak- 
to- val  ley  ratio  present  in  the  original  speech,  rather  than 
simply  preserving  it.  This  requires  more  precision  in 
selecting  the  principal  peaks.  In  fart,  theory  suggests  that 
we  choose  peaks  at  the  formant  frequencies  rather  than  j  ust 
at  some  relative  maxima  of  the  spectrum  Using  the  L PC 
spectrum  as  a  guide  is  one  convenient  way  of  achieving  this 
goal.  It  also  ensures  that  the  principal  peaks  will  be 
appropriately  distributed  throughout  the  spectrum  and  not 
cl  ustered  i  n  I  ower  frequency,  hi  gher  ampl  i  tude  areas. 

TheCOL-LPC  processing  is  similar  in  principle  to  the 
COL  processing.  Howa/er  rather  than  operating  on  the 
sinusoidal  model  peaks,  the  algorithm  finds  the  desired  gain 
for  each  principal  peak  of  the  LPC  spectrum  and  then 
interpolates  the  LPC  valley  regions  between  peaks  in  the 


same  manner  as  with  COL.  The  gain  for  each  sinusoidal 
peak  is  then  chosen  to  match  the  processed  LPC  spectrum 

Fig.  3  shows  the  selected  peaks  and  processed  speech 
for  both  the  COL  and  COL-LPC  processing.  In  this 
example,  the  COL  algorithm  chose  a  principal  peak  that  is 
actually  in  a  valley  region  since  it  is  a  local  maximum  and  is 
the  required  distance  from  another  chosen  peak.  It  also 
tends  to  favor  lower  frequency  regions.  In  contrast,  the 
COL-LPC  will  only  select  principal  peaks  that  are  actual 
peaks  in  the  spectrum  (formants).  There  are  only  slight 
differences  in  the  spectra;  however,  the  higher  frequency 
portion  of  the  COL-LPC  spectrum  is  slightly  higher  in 
amplitude  These  differences  are  very  difficult  to  detect  in 
listening  tests. 

V.  Discussion  and  Conclusions 

The  characteristics  of  sensorineural  hearing  loss, 
particularly  in  the  case  of  sloping  losses,  suggest  that 
amplitude  compression  processing  could  benefit  many 
hearing  impaired  listeners.  To  date,  most  of  the  test  results 
with  conventional  multi  band  processing  have  been 
disappointing.  This  may  be  due  to  the  implementation  of 
multi  band  processing,  and  not  its  inherent  properties. 
Presented  here  are  two  variations  of  a  processing  algorithm 
that  exploit  the  strengths  of  the  sinusoidal  model  to  produce 
high  quality  amplitude  compressed  speech  that  is  within  the 
residual  dynamic  range  of  the  impaired  listener  and  has 
strong  resolution  of  the  important  spectral  peaks.  Both 
operate  on  a  time- varying  basis  to  adjust  to  the 
characteristics  of  the  speech  in  each  frame,  and  on  a 
frequency-dependent  basis  to  accommodate  the  shape  of  the 
hearing  loss. 

Another  potential  bendit  of  both  COL  and  COL-LPC  is 
that  they  allow  fast  compression  without  the  discontinuities 
that  occur  in  filter-based  systems.  The  basilar  membrane 
accomplishes  its  compression  instantaneously  which 
suggests  that  an  effective  processing  system  should  come 
close  to  instantaneous  compression.  Because  of  the 
distortions  that  accompany  multiband  filter-based  systems, 
listeners  tend  to  prefer  longer  time  constants  that  cause 
fewer  audi  ble  changes  i  n  the  signal . 

Testing  has  not  been  conducted  on  COL-LPC,  however, 
we  expect  it  to  perform  si  mi  I  arl  y  to  the  basi  c  COL  al  gori  thm 
when  no  additional  spectral  enhancement  is  performed.  The 
flexibility  of  COL-LPC  will  allow  the  unique  opportunity  to 
conduct  further  research  on  the  benefits  of  spectral 
enhancement  in  combination  with  compression  processing. 
Future  plans  call  for  extensive  testing  under  a  variety  of 
I  i  steni  ng  condi  ti  ons  usi  ng  the  real  -ti  me  system 
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Fig  3.  Principal  peaks  and  spectra  for  COL-LPC  (top  panel) 
and  COL  (bottom  panel).  COL-LPC  spectrum  is  overlaid 
with  the  LPC  spectrum 
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